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STATISTICAL  COMPRESSIVE  SENSING  OF  GAUSSIAN  MIXTURE  MODELS 


Guoshen  Yu  and  Guillermo  Sapiro 
ECE,  University  of  Minnesota,  Minneapolis,  MN  55455,  U.S.A. 


ABSTRACT 

A  new  framework  of  compressive  sensing  (CS),  namely  statistical  compres¬ 
sive  sensing  (SCS),  that  aims  at  efficiently  sampling  a  collection  of  signals 
that  follow  a  statistical  distribution  and  achieving  accurate  reconstruction 
on  average,  is  introduced.  For  signals  following  a  Gaussian  distribution, 
with  Gaussian  or  Bernoulli  sensing  matrices  of  ^{k)  measurements, 
considerably  smaller  than  the  ^(^log(A/^))  required  by  conventional  CS, 
where  N  is  the  signal  dimension,  and  with  an  optimal  decoder  implemented 
with  linear  filtering,  significantly  faster  than  the  pursuit  decoders  applied 
in  conventional  CS,  the  error  of  SCS  is  shown  tightly  upper  bounded  by 
a  constant  times  the  ^-best  term  approximation  error,  with  overwhelming 
probability.  The  failure  probability  is  also  significantly  smaller  than  that 
of  conventional  CS.  Stronger  yet  simpler  results  further  show  that  for 
any  sensing  matrix,  the  error  of  Gaussian  SCS  is  upper  bounded  by  a 
constant  times  the  ^-best  term  approximation  with  probability  one,  and  the 
bound  constant  can  be  efficiently  calculated.  For  signals  following  Gaussian 
mixture  models,  SCS  with  a  piecewise  linear  decoder  is  introduced  and 
shown  to  produce  for  real  images  better  results  than  conventional  CS  based 
on  sparse  models. 

Index  Terms —  Compressive  sensing,  Gaussian  mixture  models 

I.  INTRODUCTION 

Compressive  sensing  (CS)  attempts  to  achieve  accurate  signal  recon¬ 
struction  while  sampling  signals  at  a  low  sampling  rate,  typically  far 
smaller  than  the  Nyquist/Shannon  rate.  Let  x  G  be  a  signal  of  interest, 
d>  G  a  non-adaptive  sensing  matrix  {encoder),  consisting  of  M  <C  V 

measurements,  y  =  Ox  G  R^  a  measured  signal,  and  A  a  decoder  used  to 
reconstruct  x  from  Ox.  CS  develops  encoder-decoder  pairs  (O,  A)  such  that 
a  small  reconstruction  error  x  —  A(Ox)  can  be  achieved. 

Reconstructing  x  from  Ox  is  an  ill-posed  problem  whose  solution 
requires  some  prior  information  (model)  on  the  signal.  Instead  of  the 
frequency  band-limit  signal  model  assumed  in  classic  Shannon  sampling 
theory,  conventional  CS  adopts  a  sparse  signal  model  (manifold  models 
have  been  considered  as  well  [4],  [8]),  i.e.,  there  exists  a  dictionary, 
typically  an  orthogonal  basis  W  G  R^^^,  a  linear  combination  of  whose 
columns  generates  an  accurate  approximation  of  the  signal,  x  %,  with 
the  coefficients  a[m],  \  <  m  <  N,  having  their  amplitude  decaying  fast 
after  being  sorted.  For  signals  following  the  sparse  model,  it  has  been 
shown  that  using  some  random  sensing  matrices  such  as  Gaussian  and 
Bernoulli  matrices  d>  with  M  =  ^(^log(A/^))  measurements,  and  an  l\ 
minimization  or  a  greedy  matching  pursuit  decoder  A  promoting  sparsity, 
with  high  probability  CS  leads  to  accurate  signal  reconstruction.  The 
obtained  approximation  error  is  tightly  upper  bounded  by  a  constant  times 
the  ^-best  term  approximation  error,  the  minimum  error  that  one  may 
achieve  by  keeping  the  k  largest  coefficients  in  a  [6],  [9],  [10]. 

The  present  paper  introduces  a  novel  framework  of  CS,  namely  statistical 
compressive  sensing  (SCS).  As  opposed  to  conventional  CS  that  deals  with 
one  signal  at  a  time,  SCS  aims  at  efficiently  sampling  a  collection  of  signals 
and  having  accurate  reconstruction  on  average.  Instead  of  restricting  to 
sparse  models,  SCS  works  with  general  Bayesian  models.  Assuming  that 
the  signals  x  follow  a  distribution  with  probability  density  function  (pdf) 
/?(x),  SCS  designs  encoder-decoder  pairs  (0,A)  so  that  the  average  error 

£x||x-A($x)||x  =  y  ||x-A(Ox)||xp(x)Jx,  (1) 

where  ||  •  ||x  is  a  norm,  is  small.  As  an  important  example,  SCS  with 
Gaussian  models  is  here  shown  to  have  improved  performance  (bounds) 
relative  to  conventional  CS,  the  signal  construction  calculated  with  an 
optimal  decoder  A  implemented  via  a  fast  linear  filtering.  Moreover,  for 


Gaussian  mixture  models  (GMM),  SCS  with  a  piecewise  linear  decoder  is 
introduced  and  shown  to  be  very  effective. 

The  motivation  of  SCS  with  Gaussian  models  is  twofold.  First,  con¬ 
trolling  the  average  error  over  a  collection  of  signals  is  useful  in  signal 
acquisition,  not  only  because  one  is  often  interested  in  acquiring  multiple 
signals  in  real  applications,  but  also  because  more  effective  processing  of 
an  individual  signal,  an  image  or  a  sound  for  example,  is  usually  achieved 
by  dividing  the  signal  in  local  subparts,  patches  or  short-time  windows  for 
instance,  and  a  signal  can  be  regarded  as  a  collection  of  subpart  signals  [17], 
[18].  In  addition,  Gaussian  mixture  models  (GMM),  which  model  signals  or 
subpart  signals  with  a  collection  of  Gaussians,  have  been  shown  effective 
in  describing  real  signals,  leading  to  excellent  results  in  image  inverse 
problems  [18]  and  missing  data  estimation  [13]. 

After  reviewing  the  optimal  decoder  for  Gaussian  signals  in  Section  II, 
a  quick  numerical  check  of  the  good  performance  of  Gaussian  SCS  is 
first  given  in  Section  III.  Section  IV  analyzes  this  performance  following 
a  similar  mathematical  approach  as  the  one  adopted  in  conventional  CS 
performance  analysis  [9].  This  result  shows  that  with  the  same  random 
matrices  as  in  conventional  CS,  but  with  a  considerably  reduced  number 
M  =  ^{k)  of  measurements,  and  with  the  optimal  decoder  implemented 
with  linear  filtering,  significantly  faster  than  the  decoders  applied  in 
conventional  CS,  the  average  error  of  Gaussian  SCS  is  tightly  upper 
bounded  by  a  constant  times  the  ^-best  term  approximation  error  with 
overwhelming  probability,  the  failure  probability  being  orders  of  magnitude 
smaller  than  that  of  conventional  CS.  Section  V  further  shows  stronger 
yet  simpler  results:  For  any  sensing  matrix,  the  average  error  of  Gaussian 
SCS  is  upper  bounded  by  a  constant  times  the  ^-best  term  approximation 
with  probability  one,  and  the  bound  constant  can  be  efficiently  calculated. 
Section  VI  introduces,  for  SCS  with  GMM,  a  piecewise  decoding  scheme 
based  on  an  efficient  maximum  a  posteriori  expectation-maximization 
(MAP-EM)  algorithm  following  [18],  and  shows  better  results  on  real 
images  than  those  obtained  with  conventional  CS. 

In  the  rest  of  this  paper,  we  will  assume  without  loss  of  generality  Gaus¬ 
sian  signal  models  x  ^  ./L(0,S),  with  mean  zero  and  diagonal  covariance 
matrix  S  =  diag[X\ , . . . ,  with  Ai  >  A2  >  •  •  •  >  the  sorted  eigenvalues. 
We  can  always  center  the  data  with  respect  to  the  Gaussian  distribution 
and  make  a  basis  change  with  principal  component  analysis  (PC A)  [18]. 
For  Gaussian  and  Bernoulli  matrices  that  are  known  to  be  universal, 
analyzing  CS  in  the  canonical  basis  or  the  PCA  basis  is  equivalent  [2]. 
The  Gaussian  distributions  are  assumed  to  be  full  rank,  i.e.,  >  0,  since 

a  degenerated  Gaussian  can  be  regarded  as  a  full-rank  Gaussian  within  a 
reduced  dimension. 

II.  OPTIMAL  DECODER  FOR  GAUSSIAN  SCS 

It  is  well-known  that  the  optimal  decoders  for  Gaussian  signals  are 
calculated  with  linear  filtering: 

Theorem  1.  [12]  Let  x  G  R^  be  a  random  vector  with  prior  pdf  JY {^,^), 
and  O  G  R^x^  M  <N,  be  a  sensing  matrix.  From  the  measured  signal 
y  =  d>x  G  R^,  the  optimal  decoder  A  that  minimizes  the  mean  square 
error  (MSB)  Ex[||x  —  A(d>x) II2]  =  min fEx[\\x  — f{^x)\\ 2],  as  well  as  the 
mean  absolute  error  (MAE)  Exdix  —  A(Ox)||i]  =  minyEx[||x  — /(Ox)||i], 
where  f  is  any  mapping  from  R^  ^  R^,  is  obtained  with  a  linear  MAP 
estimator, 

A  (d>x)  =  arg  max  (x  |  y )  =  Sd>^  (OSd>^ )  “  ^  (d>x) ,  (2) 

and  the  resulting  error  rf  =x  —  A  (Ox)  is  Gaussian  with  mean  zero  and 
with  covariance  matrix  2^^  =  Exfqr]^]  =  S  —  SO^(OSO^)“^OS,  whose 
trace  yields  the  MSE  of  SCS 

£'x[||x-A(<I>x)||^]  =  r/-(S-S<I>^(<I>S4>^)“‘<I>S).  (3) 


In  contrast  to  conventional  CS,  for  which  the  l\  minimization  or  greedy 
matching  pursuit  decoders,  calculated  with  iterative  procedures,  have  been 
shown  optimal  under  certain  conditions  on  O  and  the  signal  sparsity  [6], 
[10],  Gaussian  SCS  enjoys  the  advantage  of  having  an  optimal  decoder  (2) 
calculated  via  a  linear  filtering  for  any  O. 

Corollary  1,  If  a  random  matrix  O  G  is  drawn  independently  to 

sense  each  x,  with  all  the  other  conditions  as  in  Theorem  1,  the  MSB  of 
SCS  is 

£'x,<i.||x-A(Ox)||^  =£'<t[rr(S-S0^(0S<I>^)“'0S)].  (4) 

The  MSE  of  Gaussian  SCS  has  closed-forms  (3),  (4),  however,  a 
mathematical  analysis  of  (3)  and  (4)  seems  uneasy  due  to  the  complexity 
of  the  involved  matrix  expression.  The  next  section  evaluates  these  values 
through  Monte  Carlo  simulations,  preceding  the  theoretical  bounds  later 
developed. 

III.  PERFORMANCE  OF  GAUSSIAN  SCS 
-  A  NUMERICAL  ANALYSIS  AT  FIRST 

This  section  numerically  evaluates  the  MSE  of  Gaussian  SCS,  and  com¬ 
pares  it  with  the  minimal  MSE  generated  by  the  best  ^-term  approximation. 
Under  the  Gaussian  signal  model  x~./K(0,S)  with  sorted  eigenvalues 
^  ^  it  is  well-known  that  the  best  k-term  approximation 

error, 

aj.({x})x  =£x||x-xa:||x  and  ct<;({x})^  =£x||x-xa:||2,  (5) 

is  obtained  with  a  linear  projection  to  the  first  k  entries,  i.e.,  XK[n]  =  x[n], 
\/n  e  K  ,k},  and  XK[n]  =  0  otherwise  [14].  Note  that  a^({x})2  = 

This  best  k-term  approximation  sensing  is  impractical  with 
GMM,  since  the  Gaussian  assignment  of  a  signal  is  unknown  at  the 
acquisition  moment,  GMM  describing  real  data  much  better  than  a  single 
Gaussian  model  [18]. 

A  power  decay  of  the  eigenvalues  [14], 

kfn  =  m~^ ,  1  <  m  <  A,  (6) 

N  =  64,  is  assumed  in  the  Monte  Carlo  simulation.  An  independent  random 
Gaussian  matrix  realization  O  is  applied  to  sense  each  signal  x  [9],  and  (4) 
is  used  to  calculate  the  MSE  of  SCS. 

Eigures  1  (a)  and  (c)-top  plot  the  MSE  of  SCS  and  that  of  the  best  k- 
term  approximation,  as  well  as  their  ratio  as  a  function  of  k  for  Gaussian 
signals  with  a  typical  eigenvalue  decay  a  =  —  3.  With  k  increasing,  both 
MSEs  decrease,  their  ratio  being  almost  constant  at  about  3.7.  The  same 
is  plotted  in  figures  1  (b)  and  (c)-bottom,  with  k  fixed  at  a  typical  value  of 
10,  and  a  varying  from  1  to  4.  When  a  increases,  the  eigenvalues  decay 
faster,  and  the  MSEs  for  both  methods  decrease.  The  ratio  increases  almost 
linearly  with  a. 


Fig.  1.  Comparison  of  the  MSE  of  SCS  and  that  of  the  best  k-term  approxi¬ 
mation  for  Gaussian  signals.  See  text  for  details. 

These  results  indicate  a  good  performance  of  Gaussian  SCS,  its  MSE 
is  only  a  small  number  of  times  larger  than  that  of  the  best  ^-term 
approximation.  The  next  sections  provide  mathematical  analysis  of  this 
performance. 

IV.  PERFORMANCE  BOUNDS  OF  GAUSSIAN  SCS 

Following  [9],  this  section  shows  that  with  Gaussian  and  Bernoulli 
random  matrices  of  ^{k)  measurements,  considerably  smaller  than  the 
^{k\og{N/k))  required  by  conventional  CS,  the  average  error  of  Gaus¬ 
sian  SCS  is  tightly  upper  bounded  by  a  constant  times  the  k-best  term 
approximation  with  overwhelming  probability,  the  failure  probability  being 
orders  of  magnitude  smaller  than  that  of  conventional  CS.  The  proofs  of  the 
theorems  will  not  be  given  due  to  the  space  limitations.  We  consider  only 
the  encoder-decoder  pairs  (0,A)  that  preserve  Ox,  i.e.,  0(A(0x))  =  Ox, 
satisfied  by  the  optimal  A  in  (2)  for  Gaussian  signals  x,  VO. 


IV-A.  From  Null  Space  Property  to  Instance  Optimality 

The  instance  optimality  in  expectation  bounds  the  average  error  of  SCS 
with  a  constant  times  that  of  the  best  k-term  approximation  (5),  defining 
the  desired  SCS  performance: 

Definition  1.  We  say  that  (0,A)  is  instance  optimal  in  expectation  of  order 
k  in  II  •  ||x,  with  a  constant  Cq,  if 

£^x,(<i>)l|x-A(Ox)||x  <Coa^({x})x,  (7) 

the  expectation  with  respect  to  x,  and  to  O  if  one  random  O  is  drawn 
independently  for  each  x.  Similarly,  the  MSE  instance  optimality  of  order 

k  is  defined  as  _  _ 

£'x,(<i.)l|x-A(Ox)||^  <CoOi({x})^.  (8) 

The  null  space  property  in  expectation  defined  next  will  be  shown 
equivalent  to  the  instance  optimality  in  expectation. 

Definition  2.  We  say  that  O  in  (0,A)  has  the  null  space  property  in 
expectation  of  order  k  in  ||  •  ||x,  with  constant  C,  if 

£x.(®)lhl|x<Cat({»?})x,  Vr7=x-A(<I.x).  (9) 

the  expectation  with  respect  to  x,  and  to  O  if  one  random  O  is  drawn 
independently  for  each  x.  Note  that  rf  G  Null(O).  Similarly,  the  MSE  null 
space  property  of  order  k  is  defined  as 

E,,,(^)\\n\\2<COk{{r)})l,  Vr7=x-A(<I.x).  (10) 

Theorem  2.  Given  an  M  x  N  matrix  O,  a  norm  ||  •  \\x,  and  a  positive 
integer  k<N,  a  sufficient  condition  that  there  exists  a  decoder  A  such  that 
the  instance  optimality  in  expectation  (7)  holds  with  constant  Cq,  is  that  the 
null  space  property  in  expectation  (9)  holds  with  C  =  CqI2  for  this  (O,  A). 
A  necessary  condition  is  the  null  space  property  in  expectation  (9)  with 
C  =  Cq.  Similar  results  hold  between  MSE  instance  optimality  (8)  and  null 
space  property  (10),  with  the  constant  C  =  Co/4  in  the  sufficient  condition. 

Comparing  to  conventional  CS  that  requires  the  null  space  property  to 
hold  with  the  best  2k- term  approximation  error  [9],  the  requirement  for 
Gaussian  SCS  is  relaxed  to  k,  thanks  to  the  linearity  of  the  best  k-term 
approximation  for  Gaussian  signals. 

Theorem  2  proves  the  existence  of  the  decoder  A  for  which  the 
instance  optimality  in  expectation  holds  for  (d>,A),  given  the  null  space 
property  in  expectation.  However,  it  does  not  explain  how  such  decoder  is 
implemented.  The  following  Corollary,  a  direct  consequence  of  theorems  1 
and  2,  shows  that  for  Gaussian  signals  the  optimal  decoder  (2)  leads  to  the 
instance  optimality  in  expectation. 

Corollary  2.  Eor  Gaussian  signals  x  ~  c/C(0,S),  if  an  MxN  sensing 
matrix  d>  satisfies  the  null  space  property  in  expectation  (9)  of  order  kin  ||  • 
111,  with  constant  Co/2,  or  the  MSE  null  space  property  (10)  of  order  k  with 
constant  Co/4,  then  the  optimal  and  linear  decoder  A  =  Sd>^(d>SO^)“^ 
satisfies  the  instance  optimality  in  expectation  (7)  in  ||  •  ||i,  or  the  MSE 
instance  optimality  (8). 

IV-B.  From  RIP  to  Null  Space  Property 

The  Restricted  Isometry  Property  (RIP)  of  a  matrix  measures  its  ability  to 
preserve  distances,  and  is  related  to  the  null  space  property  in  conventional 
CS  [7],  [10].  The  new  linear  RIP  of  order  k  restricts  the  requirement  of 
conventional  RIP  of  order  k,  to  a  union  of  k-dimensional  linear  subspaces 
with  consecutive  supports: 

Definition  3.  Let  k  <  N  be  a  positive  integer.  Let  K\  define  a  linear 
subspace  of  functions  with  support  in  the  first  k  indices  in  [1,A],  K2  a 
linear  subspace  of  functions  with  support  in  the  next  k  indices,  and  so  on. 
The  functions  in  the  last  linear  subspace  Kj  defined  this  way  may  have 
support  with  less  than  k  indices.  An  MxN  matrix  O  is  said  to  have  linear 
RIP  of  order  k  with  constant  3  if 

(1-5)||x||2  <  ||Ox||2  <  (1  +  6)||x||2,  \/xeuj^iKj.  (11) 
The  linear  RIP  is  a  special  case  of  the  block  RIP  [11],  with  block  sparsity 
one  and  blocks  having  consecutive  support  of  the  same  size. 


The  following  theorem  relates  the  linear  RIP  (11)  of  a  matrix  O  to  its 
null  spaee  property  in  expeetation  (9). 

Theorem  3.  Let  x  G  be  a  random  vector  that  follows  a  certain 
distribution.  Let  ^  be  an  MxN  matrix  that  satisfies  the  linear  RIP  of  order 
2k  with  d  <\,  and  let  A  be  a  decoder.  Let  rf  =  x  — A(Ox).  Assume  further 
that  £x,(o)  I  ^  W I  decays  in  n:  £'x,(o)  I  ^  + 1]  I  <  ^x,(o)  I P  W  I’  Then 

O  satisfies  the  null  property  in  expectation  of  order  k  in  ||  •  ||i  (9),  with 
constant  Cq  =  1 

For  Gaussian  signals  x  G  ./F(0,S),  with  O  Gaussian  or  Bernoulli 
matrices,  one  realization  drawn  independently  for  each  x,  and  with  A  the 
optimal  decoder  (2),  the  decay  of  £'x,o|^[^]|  assumed  in  in  Theorem  3  is 
verified  through  Monte  Carlo  simulations. 

IV- C.  From  Random  Matrices  to  RIP 

The  next  Theorem  shows  that  Gaussian  and  Bernoulli  matrices  satisfy 
the  RIP  for  one  sub  space  with  overwhelming  probability. 

Theorem  4.  [2]  Let  ^  be  a  random  matrix  of  size  MxN  drawn  according 
to  any  distribution  that  satisfies  the  concentration  inequality 

Pr(||4>x||^-||x||^|>e||x||^)<2e-"‘^»(^/2)^  V  x  €  (12) 

where  0  <  b  <  1,  and  co(b/2)  >  0  is  a  constant  depending  only  on  b/2. 
Then  for  any  set  K  ^  {\^. ..  ^N}  with  \K\=k<  M,  we  have 

(l-b)||x||2  <  ||d)x||2  <  (l  +  b)||x||2,  V  xe^K,  (13) 

where  is  the  set  of  all  vectors  in  that  are  zero  outside  of  K,  with 
probability  greater  than  or  equal  to  1  —  2(12/b)^^“‘^o(5/2)M 
Bernoulli  matrices  satisfy  the  concentration  inequality  (12). 

The  linear  RIP  of  order  ^  (11)  requires  that  (13)  holds  for  N/k<N  sub¬ 
spaces.  The  next  Theorem  follows  from  Theorem  4  by  simply  multiplying 
by  N  the  probability  that  the  RIP  fails  to  hold  for  one  subspace. 

Theorem  5.  Suppose  that  M,  N  and  0  <  b  <  1  are  given.  Let  O  be 
a  random  matrix  of  size  MxN  drawn  according  to  any  distribution 
that  satisfies  the  concentration  inequality  (12).  Then  there  exist  constants 
ci,C2  >0  depending  only  on  b  such  that  the  linear  RIP  of  order  k  (11) 
holds  with  probability  greater  than  or  equal  to  1  —  2Ne~^^^  for  d>  with 
the  prescribed  b  and  k<c\M. 

Comparing  with  conventional  CS,  where  the  null  space  property  requires 
that  the  RIP  (13)  holds  for  (^)  subspaces  [2],  [7],  [10],  the  number  of 
subspaces  in  the  linear  RIP  (11)  is  sharply  reduced  to  N /k  for  Gaussian 
SCS.  In  consequence,  with  the  same  number  of  measurements  M,  the 
probability  that  a  Gaussian  or  Bernoulli  matrix  O  satisfies  the  linear  RIP  is 
substantially  higher  than  that  for  the  conventional  RIP.  Equivalently,  given 
the  same  probability  that  d>  satisfies  the  linear  RIP  or  the  conventional 
RIP  of  order  k,  the  required  number  of  measurements  for  the  linear  RIP 
is  M  ^  ^{k),  substantially  smaller  than  the  M  ^  ^(^log(V/^))  required 
for  the  conventional  RIP.  Similar  improvements  have  been  obtained  with 
model-based  CS  that  assumes  structured  sparsity  on  the  signals  [3]. 

With  the  results  above,  we  have  showed  that  for  Gaussian  signals, 
with  sensing  matrices  satisfying  the  linear  RIP  (11)  of  order  2k,  for 
example  Gaussian  or  Bernoulli  matrices  with  ^{k)  rows,  with  overwhelm¬ 
ing  probability,  and  with  the  optimal  decoder  (2),  SCS  leads  to  the 
instance  optimality  in  expectation  of  order  ^  in  ||  •  ||i  (7),  with  constant 
Co  =  2(1  Y^).  By  the  definition  of  CS,  k^^^  is  typically  small. 

V.  PERFORMANCE  BOUNDS  OE  GAUSSIAN  SCS 
WITH  RIP  IN  EXPECTATION 

This  section  shows  that  with  an  RIP  in  expectation,  a  matrix  isometry 
property  more  adapted  to  SCS,  the  Gaussian  SCS  MSE  instance  optimal¬ 
ity  (8)  of  order  k  and  constant  Cq,  holds  in  the  I2  norm  with  probability 
one  for  any  matrix.  Cq  has  a  closed-form  and  can  be  easily  computed 
numerically. 

Definition  4.  Let  x  G  be  a  random  vector  that  follows  a  certain 
distribution.  Let  ^  be  an  M  x  N  sensing  matrix  and  let  A  be  a  decoder. 


Let  Y]  =x  — A(d>x).  O  in  (d>,A)  is  said  to  have  RIP  in  expectation  in  K 
with  constant  ck  if 

Ex,(<s>)\\‘i'm\\2  =  CKE^X‘^-)\\m\\h  V  T]  =X-A((px),  (14) 

where  K  C  { 1 ,  •  •  • ,  N},  px  C  is  the  signal  rf  restricted  to  K  (rfx  = 
ri[n],  y  ne  K,  and  0  otherwise),  and  the  expectation  is  with  respect  to  x, 
and  to  O  if  one  random  d>  is  drawn  independently  for  each  x. 

The  conventional  RIP  is  known  to  be  satisfied  only  by  some  random  ma¬ 
trices,  Gaussian  and  Bernoulli  matrices  for  example,  with  high  probability. 
Eor  a  given  matrix,  checking  the  RIP  property  is  however  NP-hard  [2].  By 
contrast,  the  constant  of  the  RIP  in  expectation  (14)  can  be  measured  for 
any  matrix  via  a  fast  simulation,  the  quick  convergence  guaranteed  by  the 
concentration  of  measure  [16].  The  next  proposition,  directly  following 
from  (3)  and  (4),  further  shows  that  for  Gaussian  signals,  the  RIP  in 
expectation  has  its  constant  in  a  closed  form. 

Proposition  1.  Assume  x^  S),  ^  is  an  MxN  sensing  matrix  and  A 
is  the  optimal  decoder  (2).  Then  O  in  (O,  A)  satisfies  the  RIP  in  expectation 

(£*)  -<DRjfS<I>^(<I>S<D^)“'<I>SR^<I>^)] 

=  ck(.E^)  [rr(R;fSRj-R^S<D^(«l>S<D^)-'<DSR^)],  (15) 

where  Rk  is  anNxN  extraction  matrix  giving  px  =  0  = 

V/  G  K,  all  the  other  entries  being  zero.  The  expectation  with  respect  to  O 
is  calculated  if  one  random  O  is  drawn  independently  for  each  x. 


The  next  Theorem  shows  that  the  RIP  in  expectation  leads  to  the  MSE 
null  space  property  holding  in  equality. 

Theorem  6.  Let  x  G  be  a  random  vector  that  follows  a  certain 
distribution,  d>  an  MxN  sensing  matrix,  and  A  a  decoder.  Assume 


Ex,(o)  ^  ^  ^x,(o)  WBk^  II2  ^  some  ^  C  {1, . . . ,  V}.  Assume 

that  O  in  (0,A)  has  the  RIP  in  expectation  in  K  with  constant  ax  >  0, 
and  in  =  {\,. . .  ,N}\K  with  constant  bx  >  0.- 


W^bkWI 


=  ax, 


^x,(a))  II2 

^x,(3))  WBkc  II2 


Then  d>  satisfies 

^x,(O)hll2=<^0^x,(O)ll^/^c| 
where  Cq  =  l-\-bx/ax-  In  particular,  if  K=  {\,... 


=  bK,  V?7=x  — AOx.  (16) 


b  (17) 

k},  then  d>  satisfies  the 


MSE  null  space  property  of  order  k,  which  holds  with  equality, 

£x,(®)lhll2=Coai({»?})i  (18) 

Let  us  check  the  MSE  null  space  property  constant  Cq  of  different 
sensing  matrices  in  SCS  for  Gaussian  signals  x  G  ^  ./L(0,S),  assuming 
the  eigenvalues  of  S  follow  a  power  decay  (6)  with  a  =  —3,  and  V  =  64. 
Gaussian,  Bernoulli  and  random  subsampling  matrices  d>  of  size  MxN 
are  considered,  and  the  optimal  and  linear  decoder  A  (2)  is  applied  to 
reconstruct  the  signals.  Eor  each  matrix  distribution,  a  different  random 
matrix  realization  O  is  applied  to  sense  each  signal  x.  Note  that  since  the 
random  subsampling  matrix  d>,  each  row  containing  one  entry  with  value 
1  at  a  random  position  and  0  otherwise,  has  the  maximal  coherence  with 
the  canonical  basis,  this  matrix  is  not  suitable  for  directly  sensing  x  [5], 
and  is  replaced  by  in  the  simulation,  with  ^  a  DCT  basis  having  low 
coherence  with  O. 

Eigure  2  (a)  plots  Cq  =  l+bx/ax,  obtained  by  simulating  (16),  with  k  = 
10,  for  different  values  of  M.  When  the  number  M  of  SCS  measurements 
increases,  the  reconstruction  error  of  SCS  decreases,  resulting  in  a  smaller 
ratio  over  the  best-^  term  approximation  error  with  a  fixed  k.  Gaussian  and 
Bernoulli  matrices  lead  to  similar  Cq  values,  slightly  smaller  than  that  of 
random  subsampling  matrices.  Eigure  2  (b)  plots  Cq,  as  a  function  of  k, 
with  M  =  k.  Gaussian  and  Bernoulli  matrices  lead  to  similar  Cq  ^  4.5  that 
varies  little  with  k.  Eor  random  subsampling  matrices  Cq  slowly  increases, 
almost  linearly,  and  is  equal  to  5.5  for  a  typical  value  ^  =  10,  about  20% 
larger  than  that  of  Gaussian  and  Bernoulli  matrices. 

Erom  Corollary  2  and  Theorem  6,  we  obtain  the  next  concluding  result: 


Theorem  7.  Assume  x  ^  ./C(0,S).  Let  ^  be  an  MxN  sensing  matrix  and 
A  the  optimal  and  linear  decoder  (2).  Then  O  satisfies  the  MSE  instance 
optimality  of  order  k  (8)  with  constant  Cq  =4(1  -\-bx/cix),  ax  and  bx 


Fig.  2.  The  MSE  null  space  property  constant  Cq  (18)  of  Gaussian,  Bernoulli, 
and  random  subsampling  matrices,  as  a  function  of  M  (left),  and  of  k  with 
M  =  k  (right). 

given  in  (16). 

VI.  SCS  WITH  GMM 
-  EXPERIMENTS  ON  REAL  IMAGE  DATA 

This  section  applies  SCS  with  GMM  in  real  image  sensing  and  compares 
it  with  conventional  CS  based  on  sparse  models. 

Following  [18],  an  image  is  decomposed  into  y/N  x  y/N  =  8x8  patches 
{x/}i</</  assumed  to  follow  a  GMM:  There  exist  K  Gaussian  distributions 
{^{kik^^k)}\<k<K^  and  each  image  patch  x/  is  independently  drawn  from 
one  of  these  Gaussians  with  an  unknown  index  k.  SCS  samples  each  patch 
yi  =  O/X/,  with  a  possibly  different  O/  for  each  x/.  The  decoding  scheme 

(i)  estimates  {{iik,'^k)}i<k<K^  and  (ii)  identifies  the  Gaussian  distribution 
ki  that  generates  the  /-th  patch,  and  reconstructs  x/  with  the  optimal  and 
linear  decoder  (2)  associated  to  the  appropriate  Gaussian,  V/.  The  resulting 
piecewise  linear  decoder  is  implemented  with  a  computationally  efficient 
MAP-EM  algorithm  alternating  between  (i)  and  (ii)  [18].  The  algorithm  has 
fast  convergence  and  the  MAP  probability  of  the  measured  data  increases 
as  the  algorithm  iterates  [18]  (refer  to  [18]  Sec.  2  for  more  details). 

The  dictionary  for  conventional  CS  is  learned  with  K-SVD  [1]  from 
720,000  image  patches,  extracted  from  the  entire  standard  Berkeley  seg¬ 
mentation  database  containing  300  natural  images  [15].  The  decoder  is 
calculated  with  an  li  minimization. 


(a)  (b) 

Fig.  3.  (a).  Patch  PSNR  (dB)  vs  sampling  rate  for  SCS  and  CS  using  Gaussian 
and  random  subsampling  sensing  matrices,  (b)  Image  PSNR  (dB)  vs  sampling 
rate,  for  SCS  using  Gaussian  sensing  matrices  with  non-overlapping  reconstruc¬ 
tion,  and  subsampling  random  matrices  with  overlapping  reconstruction. 

Figure  3  (a)  shows  the  sensing  performance  on  about  260,000  (sliding) 
patches,  regarded  as  signals,  extracted  from  the  standard  image  Lena,  as 
shown  in  Figure  4.  The  PSNRs  generated  by  SCS  and  CS  using  Gaussian 
and  random  subsampling  sensing  matrices,  one  independent  realization  for 
each  patch,  are  plotted  as  a  function  of  the  sampling  rate  M/A.  At  the  same 
sampling  rate,  SCS  outperforms  SC.  The  gain  increases  from  about  0.5  dB 
at  very  low  sampling  rates  (M/N  ^  O.l),  learning  GMM  from  the  poor- 
quality  measured  data  being  more  challenging,  to  more  than  3.5  dB  at  high 
sampling  rates  (M/N  ^0.5).  (SC  using  an  “oracle”  dictionary  learned  from 
the  full  Lena  itself,  undoable  in  practice,  improves  its  performance  from 
0.2  dB  at  low  sampling  rates  to  1.3  dB  at  high  sample  rates,  still  lower  than 
SCS.)  For  both  SCS  and  CS,  Gaussian  and  random  subsampling  matrices 
lead  to  similar  PSNRs  at  low  sampling  rates  (M/N  <  0.25),  and  at  higher 
sampling  rates  the  former  gains  about  0.5  dB.  Recall  that  SCS  is  not  just 
more  accurate  and  significantly  faster,  but  also  only  uses  the  compressed 
image,  while  CS  uses  a  pre-learned  dictionary  from  a  large  database. 

Figure  3  (b)  shows  the  sensing  performance  on  the  whole  Lena  image. 
The  sensing  is  performed  on  non-overlapped  patches.  Random  subsampling 
matrices,  which  are  diagonal  operators  (one  non-zero  entry  per  row), 
have  the  advantage  of  being  able  to  lead  to  reconstruction  on  overlapped 
patches,  averaging  the  overlapped  reconstructed  patches  further  improving 
the  image  estimation.  The  PSNRs  generated  by  SCS  using  random  sub¬ 
sampling  matrices  and  overlapped  reconstruction  are  plotted,  in  comparison 
with  those  obtained  using  Gaussian  sensing  matrices  and  non-overlapped 


reconstruction.  The  former  improves  from  about  3.5  to  1.5  dB,  at  a  cost 
of  A  =  64  times  computation.  As  illustrated  in  Figure  4,  the  overlapped 
reconstruction  removes  the  block  artifacts  and  considerably  improves  the 
reconstructed  image. 


Lena  Zoom  (original)  No.-ovl.  rec.  30.54  dB  Ovl.  rec.  33.26  dB 

Fig.  4.  From  left  to  right.  Lena,  a  zoomed  crop,  reconstructed  image  by  SCS 
using  Gaussian  sensing  matrices  and  non-overlapping  reconstruction,  and  by 
SCS  using  subsampling  random  matrices  and  overlapping  reconstruction.  The 
image  is  sensed  on  non-overlapped  patches  at  a  sampling  rate  of  M/N  =  0.25. 
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